26 research outputs found

    Whisper: Fast Flooding for Low-Power Wireless Networks

    Full text link
    This paper presents Whisper, a fast and reliable protocol to flood small amounts of data into a multi-hop network. Whisper relies on three main cornerstones. First, it embeds the message to be flooded into a signaling packet that is composed of multiple packlets. A packlet is a portion of the message payload that mimics the structure of an actual packet. A node must intercept only one of the packlets to know that there is an ongoing transmission. Second, Whisper exploits the structure of the signaling packet to reduce idle listening and, thus, to reduce the radio-on time of the nodes. Third, it relies on synchronous transmissions to quickly flood the signaling packet through the network. Our evaluation on the Flocklab testbed shows that Whisper achieves comparable reliability but significantly lower radio-on time than Glossy -- a state-of-the-art flooding algorithm. Specifically, Whisper can disseminate data in FlockLab twice as fast as Glossy with no loss in reliability. Further, Whisper spends 30% less time in channel sampling compared to Glossy when no data traffic must be disseminated

    3rd Many-core Applications Research Community (MARC) Symposium. (KIT Scientific Reports ; 7598)

    Get PDF
    This manuscript includes recent scientific work regarding the Intel Single Chip Cloud computer and describes approaches for novel approaches for programming and run-time organization

    Adaptive Multiclient Network-on-Chip Memory Core: Hardware Architecture, Software Abstraction Layer, and Application Exploration

    Get PDF
    This paper presents the hardware architecture and the software abstraction layer of an adaptive multiclient Network-on-Chip (NoC) memory core. The memory core supports the flexibility of a heterogeneous FPGA-based runtime adaptive multiprocessor system called RAMPSoC. The processing elements, also called clients, can access the memory core via the Network-on-Chip (NoC). The memory core supports a dynamic mapping of an address space for the different clients as well as different data transfer modes, such as variable burst sizes. Therefore, two main limitations of FPGA-based multiprocessor systems, the restricted on-chip memory resources and that usually only one physical channel to an off-chip memory exists, are leveraged. Furthermore, a software abstraction layer is introduced, which hides the complexity of the memory core architecture and which provides an easy to use interface for the application programmer. Finally, the advantages of the novel memory core in terms of performance, flexibility, and user friendliness are shown using a real-world image processing application

    Adaptive Multiclient Network-on-Chip Memory Core : Hardware Architecture, Software Abstraction Layer, and Application Exploration

    Get PDF
    This paper presents the hardware architecture and the software abstraction layer of an adaptive multiclient Network-on-Chip (NoC) memory core. The memory core supports the flexibility of a heterogeneous FPGA-based runtime adaptive multiprocessor system called RAMPSoC. The processing elements, also called clients, can access the memory core via the Network-on-Chip (NoC). The memory core supports a dynamic mapping of an address space for the different clients as well as different data transfer modes, such as variable burst sizes. Therefore, two main limitations of FPGA-based multiprocessor systems, the restricted on-chip memory resources and that usually only one physical channel to an off-chip memory exists, are leveraged. Furthermore, a software abstraction layer is introduced, which hides the complexity of the memory core architecture and which provides an easy to use interface for the application programmer. Finally, the advantages of the novel memory core in terms of performance, flexibility, and user friendliness are shown using a real-world image processing application

    Exploration of the power-performance tradeoff through parameterization of FPGA-based multiprocessor systems

    Get PDF
    The design space of FPGA-based processor systems is huge, because many parameters can be modified at design- and runtime to achieve an efficient system solution in terms of performance, power and energy consumption. Such parameters are, for example, the number of processors and their configurations, the clock frequencies at design time, the use of dynamic frequency scaling at runtime, the application task distribution, and the FPGA type and size. The major contribution of this paper is the exploration of all these parameters and their impact on performance, power dissipation, and energy consumption for four different application scenarios. The goal is to introduce a first approach for a developer's guideline, supporting the choice of an optimized and specific system parameterization for a target application on FPGA-based multiprocessor systems-on-chip. The FPGAs used for these explorations were Xilinx Virtex-4 and Xilinx Virtex-5. The performance results were measured on the FPGA while the power consumption was estimated using the Xilinx X Power Analyzer tool. Finally, a novel runtime adaptive multiprocessor architecture for dynamic clock frequency scaling is introduced and used for the performance, power and energy consumption evaluations

    ASIR: Application-Specific Instruction-Set Router for NoC-Based MPSoCs

    No full text
    The end of Dennard scaling led to the use of heterogeneous multi-processor systems-on-chip (MPSoCs). Heterogeneous MPSoCs provide a high efficiency in terms of energy and performance due to the fact that each processing element can be optimized for an application task. However, the evolution of MPSoCs shows a growing number of processing elements (PEs), which leads to tremendous communication costs, tending to become the performance bottleneck. Networks-on-chip (NoCs) are a promising and scalable intra-chip communication technology for MPSoCs. However, these technological advances require novel and effective programming methodologies to efficiently exploit them. This work presents a novel router architecture called application-specific instruction-set router (ASIR) for field-programmable-gate-arrays (FPGA)-based MPSoCs. It combines data transfers with application-specific processing by adding high-level synthesized processing units to routers of the NoC. The execution of application-specific operations during data exchange between PEs exploits efficiently the transmission time. Furthermore, the processing units can be programmed in C/C++ using high-level synthesis, and accordingly, they can be specifically optimized for an application. This approach enables transferred data to be processed by a processing element, such as a MicroBlaze processor, before the transmission or by a router during the transmission. Moreover, a static mapping algorithm for applications modeled by a Kahn process network-based graph is introduced that maps tasks to the MicroBlaze processors and processing units. The mapping algorithm optimizes the communication cost by allocating tasks to nearest neighboring PEs. This complete methodology significantly simplifies the design and programming of ASIR-based MPSoCs. Furthermore, it efficiently exploits the heterogeneity of processing capabilities inside the routers and MicroBlaze processors

    New dimensions in design space and runtime adaptivity for multiprocessors system through dynamic and partial reconfiguration. The RAMPSoC approach

    No full text
    Embedded high performance computing applications have two requirements which hardly can be achieved simultaneously: high performance and low energy consumption. One solution is the exploitation of the low-level parallelism of a field programmable gate array (FPGA). Due to the manifold parameters, such as the adaptation of the clock frequency in relation to the application requirements, a better energy efficiency compared to traditional processor-based platforms can be achieved. However, the FPGA programming is time consuming until today and requires a very good understanding of the underlying hardware. There exist C-to-FPGA tools, which leverage the traditional FPGA programming using HDL-languages. However, C-to-FPGA tools can only be used for submodules and accelerators, because they do not handle the communication with the environment, e.g., camera interfaces, PCI-interfaces, etc. Furthermore, the results of an automatic code transformation are until today suboptimal in comparison to a hand coded design. Due to this fact, the interfaces have to be either programmed by hand, which is very time consuming, or they have to be bought from IP suppliers. Furthermore, these C-to-FPGA tools often have some restrictions on the input C, C++ language. In this book chapter a novel holistic approach called RAMPSoC (Runtime Adaptive Multiprocessor System-on-Chip) is presented. RAMPSoC provides a meet-in-the middle solution by combining the hardware flexibility and low power consumption of FPGAs with the software flexibility and the high-level programming paradigms of multiprocessor systems-on-chip. The RAMPSoC approach consists of a flexible and energy efficient hardware architecture, consisting of heterogeneous processing elements connected over a heterogeneous Star-Wheels Network-on-Chip, a user-guided design methodology and a new operating system for runtime resource management. RAMPSoC provides new dimensions for design space and runtime adaptivity by exploiting the features of dynamic and partial reconfiguration in FPGA-based designs. Using an object recognition algorithm, it was shown that the RAMPSoC is more energy efficient than a standard CPU and an NVIDIA Tesla GPU

    Scheduling and Communication-Aware Mapping of HW/SW Modules for Dynamic Partial Reconfigurable SoC Architectures

    No full text
    In this paper, we present an approach for simultaneous scheduling and placement of communicating modules for SoC architectures including devices with partial reconfiguration support and at least one CPU. This approach includes (a) a detailed modeling of the communication of modules and an optimization model for finding the best temporal and spatial placement of modules on either CPU or on the reconfigurable device including communication and reconfiguration time overheads, (b) a real SoC platform for slot-based module relocation and on-chip inter-module communication called ESM, and (c) real experimental data based on experiments on this machine. Existing approaches either neglect intermodule communication, are not able to solve the related problem, or do not provide real applications implemented on real platforms.

    SDMPSoC: Software-Defined MPSoC for FPGAs

    No full text

    Bridging the Gap between Relocatability and Available Technology: The Erlangen Slot Machine

    No full text
    We present an FPGA-based reconfigurable platform called Erlangen Slot Machine (ESM). The main advantages of this platform are: First, the possibility for each module to access peripherals independent from its location through a programmable crossbar, and local SRAM banks for individual modules. This physical design eases the implementation of run-time reconfigurable partial modules and enables an unrestricted relocation of modules on the device. We present our twoboard ESM implementation and demonstrate a partially reconfigurable video filter application as well as a relocatable computer game including a dedicated inter-module communication scheme
    corecore